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Abstract 

The emergence of low-cost sensor architectures for diverse modalities has made it possible 
to deploy sensor arrays that capture a single event from a large number of vantage points and 
using multiple modalities. In many scenarios, these sensors acquire very high-dimensional data 
such as audio signals, images, and video. To cope with such high-dimensional data, we typi- 
cally rely on low-dimensional models. Manifold models provide a particularly powerful model 
that captures the structure of high-dimensional data when it is governed by a low-dimensional 
set of parameters. However, these models do not typically take into account dependencies 
among multiple sensors. We thus propose a new joint manifold framework for data ensembles 
that exploits such dependencies. We show that simple algorithms can exploit the joint manifold 
structure to improve their performance on standard signal processing applications. Addition- 
ally, recent results concerning dimensionality reduction for manifolds enable us to formulate 
a network-scalable data compression scheme that uses random projections of the sensed data. 
This scheme efficiently fuses the data from all sensors through the addition of such projections, 
regardless of the data modalities and dimensions. 



1 Introduction 

The geometric notion of a low-dimensional manifold is a common, yet powerful, tool for modeling 
high-dimensional data. Manifold models arise in cases where (i) a if-dimensional parameter 6 can 
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be identified that carries the relevant information about a signal and (if) the signal xq E R changes 
as a continuous (typically nonlinear) function of these parameters. Some typical examples include 
a one-dimensional (1-D) signal shifted by an unknown time delay (parameterized by the translation 
variable), a recording of a speech signal (parameterized by the underlying phonemes spoken by the 
speaker), and an image of a 3-D object at an unknown location captured from an unknown viewing 
angle (parameterized by the 3-D coordinates of the object and its roll, pitch, and yaw). In these 
and many other cases, the geometry of the signal class forms a nonlinear ^-dimensional manifold 
in R N , 

M = {f(6):6E&} 1 (1) 

where 6 is the A" -dimensional parameter space [1-3]. Low-dimensional manifolds have also been 
proposed as approximate models for nonparametric signal classes such as images of human faces 
or handwritten digits [4-6]. 

In many scenarios, multiple observations of the same event may be performed simultaneously, 
resulting in the acquisition of multiple manifolds that share the same parameter space. For ex- 
ample, sensor networks — such as camera networks or microphone arrays — typically observe 
a single event from a variety of vantage points, while the underlying phenomenon can often be 
described by a set of common global parameters (such as the location and orientation of the ob- 
jects of interest). Similarly, when sensing a single phenomenon using multiple modalities, such as 
video and audio, the underlying phenomenon may again be described by a single parameterization 
that spans all modalities. In such cases, we will show that it is advantageous to model this joint 
structure contained in the ensemble of manifolds as opposed to simply treating each manifold in- 
dependently. Thus we introduce the concept of the joint manifold: a model for the concatenation of 
the data vectors observed by the group of sensors. Joint manifolds enable the development of im- 
proved manifold-based learning and estimation algorithms that exploit this structure. Furthermore, 
they can be applied to data of any modality and dimensionality. 

In this work we conduct a careful examination of the theoretical properties of joint manifolds. 
In particular, we compare joint manifolds to their component manifolds to see how quantities like 
geodesic distances, curvature, branch separation, and condition number are affected. We then ob- 
serve that these properties lead to improved performance and noise-tolerance for a variety of signal 
processing algorithms when they exploit the joint manifold structure, as opposed to processing data 
from each manifold separately. We also illustrate how this joint manifold structure can be exploited 
through a simple and efficient data fusion algorithm that uses random projections, which can also 
be applied to multimodal data. 

Related prior work has studied manifold alignment, where the goal is to discover maps be- 
tween several datasets that are governed by the same underlying low-dimensional structure. Lafon 
et al. proposed an algorithm to obtain a one-to-one matching between data points from several 
manifold-modeled classes [7]. The algorithm first applies dimensionality reduction using diffu- 
sion maps to obtain data representations that encode the intrinsic geometry of the class. Then, an 
affine function that matches a set of landmark points is computed and applied to the remainder of 
the datasets. This concept was extended by Wang and Mahadevan, who apply Procrustes analysis 
on the dimensionality-reduced datasets to obtain an alignment function between a pair of mani- 
folds [8]. Since an alignment function is provided instead of a data point matching, the mapping 
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obtained is applicable for the entire manifold rather than for the set of sampled points. In our set- 
ting, we assume that either (?) the manifold alignment is provided intrinsically via synchronization 
between the different sensors or (ii) the manifolds have been aligned using one of the approaches 
described above. Our main focus is a theoretical analysis of the benefits provided by analyzing the 
joint manifold versus solving our task of interest separately on each of the manifolds observed by 
individual sensors. 

This paper is organized as follows. Section [2] introduces and establishes some basic properties 
of joint manifolds. Section [3] considers the application of joint manifolds to the tasks of classi- 
fication and manifold learning. Section |4] then describes an efficient method for processing and 
aggregating data when it lies on a joint manifold, and Section \5\ concludes with discussion. 

2 Joint manifolds 

In this section we develop a theoretical framework for ensembles of manifolds which are jointly 
parameterized by a small number of common degrees of freedom. Informally, we propose a data 
structure for jointly modeling such ensembles; this is obtained by concatenating points from dif- 
ferent ensembles that are indexed by the same articulation parameter to obtain a single point in 
a higher-dimensional space. We begin by defining the joint manifold for the general setting of 
arbitrary topological manifolds^. 

Definition 2.1. Let Ai\, Ai 2 , ■ ■ ■ , M. j be an ensemble of J topological manifolds of equal dimen- 
sion K. Suppose that the manifolds are homeomorphic to each other, in which case there exists a 
homeomorphism tpj between Aii and Aij for each j. For a particular set of mappings {V ; i}/=2> 
we define the joint manifold as 

M* = {(pi,P2,---,pj) eM 1 xM 2 x---xM J :p j = ^(Pi),2 < j < J}- 

Furthermore, we say that Ai±, Ai 2 , . . . ,M.j are the corresponding component manifolds. 

Notice that M.\ serves as a common parameter space for all the component manifolds. Since 
the component manifolds are homeomorphic to each other, this choice is ultimately arbitrary. In 
practice it may be more natural to think of each component manifold as being homeomorphic to 
some fixed A" —dimensional parameter space 0. However, in this case one could still define Ai* 
as is done above by defining ipj as the composition of the homeomorphic mappings from Ai\ to 9 
and from to Aij. 

As an example, consider the one-dimensional manifolds in Figured] Figures [T] (a) and (b) show 
two isomorphic manifolds, where Ai\ = (0, 2n) is an open interval, and Ai 2 = {^{Q) '■ 9 £ M-i} 
where ip 2 (9) = (cos(0), sin(#)), i.e., M.2 = 0) is a circle with one point removed (so that it 

remains isomorphic to a line segment). In this case the joint manifold Ai* = {(9, cos(6 l ), sin(6*)) : 
9 E (0, 2n)}, illustrated in Figure [T] (c), is a helix. Notice that there exist other possible home- 
omorphic mappings from M.\ to Ai 2 , and that the precise structure of the joint manifold as a 
submanifold of R 3 is heavily dependent on the choice of this mapping. 

'A comprehensive introduction of topological manifolds can be found in Boothby [9]. 
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(a) Mi C E: line segment (b) M 2 Q M 2 : circle segment (c) Af* C M 3 : helix segment 

Figure 1 : A pair of isomorphic manifolds Aii and Ai 2 , and the resulting joint manifold Ai*. 

Returning to the definition of Ai*, observe that although we have called Ai* the joint manifold, 
we have not shown that it actually forms a topological manifold. To prove that Ai* is indeed a 
manifold, we will make use of the fact that the joint manifold is a subset of the product manifold 
All x Ai 2 x ■ ■ ■ x Aij. One can show that the product manifold forms a JA'-dimensional manifold 
using the product topology [9]. By comparison, we now show that Ai* has dimension only K. 

Proposition 2.1. Ai* is a K -dimensional submanifold of Aii x Ai 2 x ■ ■ • x Aij. 

Proof. We first observe that since Ai* is a subset of the product manifold, we automatically have 
that Ai* is a second countable Hausdorff topological space. Thus, all that remains is to show 
that Ai* is locally homeomorphic to M. K . Let p = (p 1 ,p 2 , ... ,pj) be an arbitrary point on AI*. 
Since pi £ Ai\, we have a pair (Ui, 4>i) such that U\ C Ai\ is an open set containing p x and 
<pi : Ui — > V is a homeomorphism where V is an open set in WL K . We now define for 2 < j < J 
Uj = ipj(Ui) and 0, = 4>\° ipj 1 ■ Uj — > V. Note that for each j, Uj is an open set and <pj is a 
homeomorphism (since xjij is a homeomorphism). 

Now define U* = (Ui x U 2 x • • • x C/j) n AI*. Observe that Z7* is an open set and that p eU*. 
Furthermore, let q = (q t , q 2 , . . . , qj) be any element of U* . Then (j)j(qj) = 4>\ o ip~ l {qj) = 4>i(qi) 
for each 2 < j < J. Thus, since the image of each £ £/, in 1/ under their corresponding 4>j is 
the same, we can form a single homeomorphism <p* : U* — » 1/ by assigning = This 
shows that AI* is locally homeomorphic to M x as desired. □ 

Since Ai* is a submanifold of Ai\ x Af2 x • • • x Ai j, it also inherits some desirable properties 
from its component manifolds. 

Proposition 2.2. Suppose that Ai\, Ai 2 , . . . Ai j are isomorphic topological manifolds and Ai* is 
defined as above. 

1. If Aii, Ai 2 , ... ,A4j are Riemannian, then Ai* is Riemannian. 

2. If Aii, Af 2, ••• ,Aij are compact, then Ai* is compact. 
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Proof. The proofs of these facts are straightforward and follow from the fact that if the component 
manifolds are Riemannian or compact, then the product manifold will be as well. Ai* then inherits 
these properties as a submanifold of the product manifold [9]. □ 

Up to this point we have considered general topological manifolds. In particular, we have not 
assumed that the component manifolds are embedded in any particular space. If each compo- 
nent manifold Aij is embedded in M. N i , the joint manifold is naturally embedded in IR^* where 
N* = J2j=i Nj. Hence, the joint manifold can be viewed as a model for data of varying ambient 
dimension linked by a common parametrization. In the sequel, we assume that each manifold Ai j 
is embedded in M. N , which implies that Ai* C M. JN . Observe that while the intrinsic dimension of 
the joint manifold remains constant at K, the ambient dimension increases by a factor of J. We 
now examine how a number of geometric properties of the joint manifold compare to those of the 
component manifolds. 

We begin with the following simple observation that Euclidean distances between points on 
the joint manifold are larger than distances on the component manifolds. In the remainder of this 
paper, whenever we use the notation || ■ || we mean || ■ \\e 2 , i.e., the £ 2 (Euclidean) norm on M. N . 
When we wish to differentiate this from other t v norms, we will be explicit. 

Proposition 2.3. Let p = (pi,P2, ■ ■ ■ ,Pj) and q = (qi, q 2 , ■ ■ ■ , qj) be two points on the joint 
manifold Ai*. Then 



\p-q\ 



k J^lbi -Qj\ 

\ 3=1 



Proof This follows from the definition of the Euclidean norm: 

JN J N J 

lb - q\\ 2 = 53(p(«) - <?W) 2 = Yl SZ^'W ~ fcW) 2 = Wpj ~ ^1 

j=l j=l i=l j=l 



□ 



While Euclidean distances are important (especially when noise is introduced), the natural 
measure of distance between a pair of points on a Riemannian manifold is not Euclidean distance, 
but rather the geodesic distance. The geodesic distance between points p, q E M. is defined as 

d M (p,q) = inf{L( 7 ) : 7 (0) = p, T (l) = q}, (2) 

where 7 : [0, 1] — > Ai is a C^smooth curve joining p and q, and £(7) is the length of 7 as 
measured by 

Hl)= f \\j(t)\\dt. (3) 







In order to see how geodesic distances on Ai* compare to geodesic distances on the component 
manifolds, we will make use of the following lemma. 
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Lemma 2.1. Suppose that .Mi, A4 2 , ■ ■ ■ , -M j are Riemannian manifolds, and let 7 : [0, 1] — > .M* 
Z?e a C 1 -smooth curve on the joint manifold. Then we can write 7 = (71, 72, • • • , 7j) where each 
7j : [0, 1] — > A4j is a C 1 -smooth curve on Aij, and 



^^L( 7j )<L(7)<5>(7,)- 



Proof. We begin by observing that 



m)\\dt 



(4) 



For a fixed t, let a?j = ||7/(t)||, and observe that {x\, x 2 , ■ ■ ■ , xj) is a vector in R J . Thus we may 
apply the standard norm inequalities 
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\x\\ti < IMk < \\x\Ui 



to obtain 



1 



< 



. Eii^(*)ii 2 <EHi(*)ii- 

\ i=i 3=1 



Combining the right-hand side of © with ([4]) we obtain 

.1 J J „i 



Similarly, from the left-hand side of © we obtain 

"l-i J , J r i 



(5) 



(6) 



□ 



We are now in a position to compare geodesic distances on Ai* to those on the component 
manifold. 

Theorem 2.1. Suppose that .Mi, Ai 2 , ■ ■ ■ ,A4j are Riemannian manifolds. Letp = (pi,p 2 , ■ ■ ■ ,Pj) 
and q = (q±, q 2 , ■ ■ ■ , qj) be two points on the corresponding joint manifold A4*. Then 



d M *(p,q) 



1 J 



(7) 
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If the mappings ^2,^3, are isometries, i.e., d Ml (p u <?i) = d Mj (i>j(px),il)j{qi)) for any j 

and for any pair of points (p,q), then 

dM*(p,q) = -Tj^dMifaiOj) = y/~J • dMi(Pi,Qi)- (8) 

1=1 



Proof. If 7 is a geodesic path between p and q, then from Lemma I27T1 

j 



d M *(p,q) 



.7=1 



By definition £(7j) > dMjiPj, <?j); hence, this establishes ©. 

Now observe that lower bound in Lemma [27T1 is derived from the lower inequality of ©. This 
inequality is attained with equality if and only if each term in the sum is equal, i.e., L(jj) = L(j k ) 
for all j and k. This is precisely the case when ip2, ^3, • • • , ipj are isometries. Thus we obtain 

1 J 

d M *{p,q) =£(7) = -jj^2 L (^j) = y^Hn)- 

We now conclude that £(71) = d Ml {pi, qi) since if we could obtain a shorter path 71 from pi to 
qi this would contradict the assumption that 7 is a geodesic on Ai*, which establishes ®. □ 

Next, we study local smoothness and global self avoidance properties of the joint manifold 
using the notion of condition number. 

Definition 2.2. [10] Let Ai be a Riemannian submanifold ofK N . The condition number is 

defined as 1/r, where r is the largest number satisfying the following: the open normal bundle 
about Ai of radius r is embedded in M. N for all r < r. 

The condition number of a given manifold controls both local smoothness properties and global 
properties of the manifold. Intuitively, as 1/r becomes smaller, the manifold becomes smoother 
and more self-avoiding. This is made more precise in the following lemmata. 

Lemma 2.2. [10] Suppose Ai has condition number 1/r. Let p, q G Ai be two distinct points on 
Ai, and let j(t) denote a unit speed parameterization of the geodesic path joining p and q. Then 

max|| 7 (t)|| < -. 
* r 

Lemma 2.3. [10] Suppose Ai has condition number 1/r. Let p, q £ Ai be two points on Ai such 
that \\p — q\\ = d. If d < r/2, then the geodesic distance d M (p, q) is bounded by 

d M (p,q)<T(l-^l-2d/r). 
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We wish to show that if the component manifolds are smooth and self avoiding, the joint man- 
ifold is as well. It is not easy to prove this in the most general case, where the only assumption is 
that there exists a homeomorphism (i.e., a continuous bijective map tp) between every pair of man- 
ifolds. However, suppose the manifolds are diffeomorphic, i.e., there exists a continuous bijective 
map between tangent spaces at corresponding points on every pair of manifolds. In that case, we 
make the following assertion. 

Theorem 2.2. Suppose that Aii, M.2, ■ ■ ■ ,M.j are Riemannian submanifolds ofR N , and let 1 / Tj 
denote the condition number of Aij. Suppose also that the fa, fa, ■ ■ ■ , if)j that define the corre- 
sponding joint manifold Ai* are diffeomorphisms. Ifl/r* is the condition number of Ai*, then 

t* > min T.,-, 
i<j<J J 

or equivalently, 

1 1 
— < max — . 

T* l<j<J Tj 

Proof. Let p £ Ai*, which we can write as p = (pi,p2, ■ ■ ■ ,Pj) with pj £ Aij. Since the 
{^j} J j=2 are diffeomorphisms, we may view Ai* as being diffeomorphic to Aii, i.e., we can build 
a diffeomorphic map from Ai\ to Ai* as 

P = ^*(Pi) ■= (Pufa(P2),---,ipj(Pj))- 

We also know that given any two manifolds linked by a diffeomorphism t/jj : Aii — > A4j, 
each vector v\ in the tangent space Ti(pi) of the manifold A4\ at the point p\ is uniquely mapped 
to a tangent vector Vj := <f>j{vi) in the tangent space Tj(pj) of the manifold Aij at the point 
Pj = ipj(pi) through the map <pj := J o ipj(pi) , where J denotes the Jacobian operator. 

Consider the application of this property to the diffeomorphic manifolds M.\ and A4*. In this 
case, the tangent vector v\ £ 7i(pi) to the manifold A4\ can be uniquely identified with a tangent 
vector v = 4>*{ v i) e T*(jp) to the manifold Ai*. This mapping is expressed as 

0*M = j o r(pi) = fa, j o fa( Pl ), ...,jo fafa)), 

since the Jacobian operates componentwise. Therefore, the tangent vector v can be written as 

V = (j)*{vi) = (Wi,0 2 (wi), . . . ,0jOl)), 

= (vi,v 2 , ■ ■ ■ ,Vj). 

In other words, a tangent vector to the joint manifold can be decomposed into J component vectors, 
each of which are tangent to the corresponding component manifolds. 

Using this fact, we now show that a vector 77 that is normal to Ai* can also be broken down into 
sub-vectors that are normal to the component manifolds. Consider p £ Ai*, and denote T*^) 1 - as 
the normal space at p. Suppose 77 = (771, . . . , rjj) £ T*^)- 1 . Decompose each r]j as a projection 
onto the component tangent and normal spaces, i.e., for j = 1, . . . , J, 

r\j = Xj + yj, Xj £ Tj(pj), V j £ 7 ;,(/>,) 
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Figure 2: Point at which the normal bundle for the helix manifold intersects itself. 



such that (xj, yj) = for each j. Let x = (xi, . . . , xj) and y = (yi, . . . , yj). Then rj = x + y, and 
since y is tangent to the joint manifold M*, we have (77, y) = (x + y, x) = 0, and thus 

(V,x) = -\\x\\ 2 . 

But, 

j 

(y,x) = (Vj,Xj) = 0. 

Hence x = 0, i.e., each 77^ is normal to A4j. 

Armed with this last fact, our goal now is to show that if r < mini<j<j Tj then the normal 
bundle of radiu s r i s embedded in R N , or equi valently, thatp + ?7 7^ q + u provided that 1 1 rj \ \ , 1 1 v \ \ < 
r. Indeed, suppose ||r/||, \v\ < r < rami<j<jTj. Since \\rjj\\ < \\t)\\ and < for all 
1 < j < J, we have that \\r]j\\, ||^|| < mini<i<jTj < Tj. Since we have proved that rjj, Uj are 
vectors in the normal bundle of Aij and their magnitudes are less than Tj, then pj + rjj 7^ q$ + Vj 
by the definition of condition number. Thus p + 77 7^ q + v and the result follows. □ 

This result states that for general manifolds, the most we can say is that the condition number 
of the joint manifold is guaranteed to be less than that of the worst manifold. However, in practice 
this is not likely to happen. As an example, Figure [2] illustrates the point at which the normal 
bundle intersects itself for the case of the joint manifold from Figure[T|(c). In this case we obtain 
t* = a/7t 2 /2 + 1. Note that the condition numbers for the manifolds A4± and M.2 generating Ai* 
are given by T\ — 00 and r 2 = 1. Thus, while the condition number in this case is not as good as 
the best manifold, it is still notably better than the worst manifold. In general, even this example 
may be somewhat pessimistic, and it is possible that in many cases the joint manifold may be better 
conditioned than even the best manifold. 
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3 Joint manifolds in signal processing 



Manifold models can be exploited by a number of algorithms for signal processing tasks such 
as pattern classification, learning, and control [11]. The performance of such algorithms often 
depends on geometric properties of the manifold model such as its condition number and geodesic 
distances along its surface. The theory developed in Section [2] suggests that the joint manifold 
preserves or improves these properties. We will now see that when noise is introduced these results 
suggest that, in the case of multiple data sources, it can be extremely beneficial to use algorithms 
specifically designed to exploit the joint manifold structure. 

3.1 Classification 

We first study the problem of manifold-based classification. The problem is defined as follows: 
given manifolds Ai and Af, suppose we observe a signal y = x + n £ R. N where either x £ Ai or 
x E Af and n is a noise vector, and we wish to find a function / : ~§i N — > {Ai, Af} that attempts 
to determine which manifold "generated" y. We consider a simple classification algorithm based 
on the generalized maximum likelihood framework described in [12]. The approach is to classify 
by computing the distance from the observed signal y to each of the manifolds, and then classify 
based on which of these distances is smallest, i.e., our classifier is 

f(y) = axgmm[d(y,M),d(y,Af)]. (9) 

We will measure the performance of this algorithm for a particular pair of manifolds by considering 
the probability of misclassifying a point from Ai as belonging to Af, which we denote PmN- 
To analyze this problem, we employ three common notions of separation in metric spaces: 

• The minimum separation distance between two manifolds Ai and Af is defined as 

5{M,Af) = inf d(p,Af). 

• The Hausdorff distance from Ai to Af is defined to be 

D(M,Af) = sup d(p,Af), 
peM 

with D(Af, M) defined similarly. Note that 5{M,Af) = 5(Af, M), while in general 
D(M,N) ^ D(Af,M). 

• The maximum separation distance between manifolds Ai and Af is defined as 

A(Ai,Af) = sup sup \\x — y\\. 

As one might expect, Pmjv is controlled by the separation distances. For example, suppose that 
x e Ai; if the noise vector n is bounded and satisfies < 5(Ai,Af)/2, then we have that 
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d(y,M) < \\n\\ < S(M, Af)/2 and hence 



8{M,M) = inf lb -all 

inf kr \\p - y + y - q\\ 

< inf lb -2/11 + 

peM,qeM 
= d(y,M) + d(y,M) 

< S(M,Af)/2 + d(y,Af). 

Thus we are guaranteed that 

c%,AO > 5{M,N)/2. 

Therefore, d(y, Ai) < d(y,Af) and the classifier defined by © satisfies Pmn — 0- We can refine 
this result in two possible ways. First, note that the amount of noise e that we can tolerate without 
making an error depends on x. Specifically, for a given x E Ai, provided that ||n|| < d(x, Af)/2 we 
still have that Pmn — 0- Thus, for a given x E Ai we can tolerate noise bounded by d(x, Af)/2 E 
[8(M,JV)/2 7 D(M,M)/2]. 

A second possible refinement that we will explore below is to ignore this dependence of x, but 
to extend our noise model to the case where ||n|| > 6(Ai,Af)/2 with non-zero probability. We can 
still bound Pmn since 

Pmn < P{\H\ >6(M,M)/2). (10) 

We provide bounds on this probability for both the component manifolds and the joint manifold 
as follows: first, we first compare the separation distances for these cases. 

Theorem 3.1. Consider the joint manifolds Ai* C Ai i x Ai 2 X • • • X Ai j and Af* C Af± x Af 2 x 
• ■ • x Afj. Then, the following bounds hold: 

1. Joint minimum separation: 

J 



J2 52 ( M J^j) ^ S 2 (M*,AS*) < mm ( 5 2 (M k ,Af k ) + ^ A 2 (Ai j ,Af j ) ) . (11) 

3=1 1 ~ h ~ J 

2. Joint Hausdorff separation from Ai* toAf*: 



max D 2 (Ai k ,Af k ) + ^5 2 {Ai 3 M 3 ) < D 2 (Ai*,Af*) < V A 2 (Ai j , Afj). (12) 

j¥=k I j=l 



3. Joint maximum separation from Ai* to Af*: 

max ( A 2 {Ai k ,Afk) + J2 62 ( M j^A ^ & 2 (Ai*,Af*) < ^A 2 {M j ,Af j ). (13) 
1 - fc - J \ j^k J j=i 



11 



Proof. Inequality (fTTT) is a simple corollary of Proposition 12.31 Let p = (p 1 ,p 2 ,... ,pj) and 
Q — (Qii Q.2-, ■ ■ ■ i Qj) respectively be the points on Ai* and Af* for which the minimum separa- 
tion distance 5(M*,Af*) is attained, i.e., 



(p,q) = arg inf inf \\p - q\\. 



Then, 



j 

5 2 (M*,Af*) = Up - g|| a = " fcll 2 

j 

3=1 

since the distance between two points in any given component space is greater than the minimum 
separation distance corresponding to that space. This establishes the lower bound in (fTTT) . We 
obtain the upper bound by selecting a k, and selecting p E Ai* and q e J\f* such that pk and q% 
attain the minimum separation distance 6(A4k, -A4). From the definition of 5(A^*,A/"*), we have 
that 

j 

5\M*,Af*) < ||p-?|| 2 = X>i-rf 

i=i 

= ^(Aijb.W + ^bi-Sill 2 

and since this holds for every choice of k, (fTTT) follows by taking the minimum over all k. 

To prove inequality (fT2l) . we follow a similar course. We begin by selecting p E M* and 
q E Af* that satisfy 

(p, q) = arg sup inf \\p — q\\. 

Then, 

j 

D\M\M*) = ||p-g|| a = X;bi-rf 



< 



which establishes the upper bound in (fT2l) . To obtain the lower bound, we again select a k, and 
now let p E Ai* be the point for which the corresponding at which the Hausdorff separation for 
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the component manifold M. k is attained, i.e., the corresponding point p k is furthest away from M k 
as can be possible in A4 k . Let q G J\f* be the nearest point in J\f* to p. From the definition of the 
Hausdorff distance, we get that 

D(M*,Af*) > \\p-q\\, 

since the Hausdorff distance is the maximal distance between the points in M* and their respective 
nearest neighbors in N* . Again, it also follows that 



D 2 (M*,M*) > \\p - qf = \\p k -q k \\ 2 + 

= D 2 (M k ,Af k )+J2\\Pj-Qj\ 



2 



j^k 

> D 2 (M k ,Afk)+J26 2 (M j ,Af j ). 



Since this again holds for every choice of k, (1121) follows by taking the maximum over all k. 

One can prove (fT3l using the same technique used to prove (PT2l) . □ 

As an example, if we consider the case where the separation distances are constant for all j, 
then the joint minimum separation distance satisfies 



y/lS{Mi,Mi) <S(M*,M*) < ^5 2 {M l ,Mi) + (J - l)A 2 (.Mi, A/i) 

< (5(A^i,M) + v / J 3 TA(A^ 1 ,M) 

In the case where 5(Mi, A/i) A(Mi, A/i) then we observe that S(M*, A/"*) can be considerably 
larger than \fj5(Ai\, This means that we can potentially tolerate much more noise while 
ensuring Pm*N* — 0- To see this, write n = (n 1; n 2 , . . . , nj) and recall that we require \\rij \\ < 
e = 5(Mj,Afj)/2 to ensure that PmjNj — 0. Thus, if we require that PjvijNj — for all j, then we 
have that 



\n\ 



\ 



ll^'ll 2 <VJe = v7<5(A^i,M)/2. 

3=1 

However, if we instead only require that Pm*N* = we only need ||n|| < 5(M* ,N*)/2, which 
can be a significantly less stringent requirement. 

The benefit of classification using the joint manifold is made more apparent when we extend 
our noise model to the case where we allow ||nj || > 5(A4j,J\fj)/2 with non-zero probability and 
apply (flOl) . To bound the probability in (flOl) . we will make use of the following adaptation of 
Hoeffding's inequality [13]. 

Lemma 3.1. Suppose that rij G M> N is a random vector that satisfies \\rij\\ < e,forj = 1,2,..., J. 
Suppose also that the rij are independent and identically distributed (i.i.d.) with E[\\rij\\] = a. 
Then ifn = (rii, n 2 , . . . , rij) G M. JN , we have that for any A > 0, 



P(\\nf > J(a 2 + A)) < exp (-^) 
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Using this lemma we can relax the assumption on e so that we only require that it is finite, 
and instead make the weaker assumption that £^[||ri||] = \flo < 5(M.,J\f)/2 for a particular pair 
of manifolds Ai, TV. This assumption ensures that A = 5 2 (A4,M)/4: — o 2 > 0, so that we can 
combine Lemma 13.11 with (fTOl) to obtain a bound on PmM- Note that if this condition does not 
hold, then this is a very difficult classification problem since the expected norm of the noise is 
large enough to push us closer to the other manifold, in which case the simple classifier given by 
© makes little sense. 

We now illustrate how Lemma 13.11 can be be used to compare error bounds between classi- 
fication using a joint manifold and classification using a particular pair of component manifolds 

Theorem 3.2. Suppose that we observe a vector y = x+n where x G M* andn = (ni, n 2 , . . . , rij) 
is a random vector such that \\rij\\ < e, for j = 1,2, . . . , J, and that the rij are i.i.d. with 
E[\\n j \\} = a<5(M k ,Af k )/2.If 

s{MtiUt) < Z&p, (14) 

and we classify the observation y according to ©, then 

( 2c *\ 

Pm*N* < exp r , (15) 



and 



such that 



Proof. First, observe that 



P Mk M k < exp ( ] , (16) 



C > Cfc. 

i2(Ar ^*W(A^)>4,*. (17) 



J 

Thus, we may set A = 5 2 (M*,Af*) /4 J - a 2 > and apply Lemma [37X1 to obtain §T5b with 



5 2 (M*,Af* 



4J 



a 2 



Similarly, we may again apply Lemma l3~ll by setting A = 5 2 (Mj, A/})/4 — a 2 > and J = 1 to 
obtain (fT6l) with 

'5 2 {M k Mk) 2 x2 
°k = I : 
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It remains to show that c* > c k . Thus, observe that 

5\M k ,Nk) < 



J 

VJ5 2 {M*,Af*) - {VJ- 1)5 2 {M*,AT*) 
J 

5 2 (M*,Af*) _ r ^ 5 2 (M*,Af*) 
< ^^1-4^-1), 

V J 

where the last inequality follows from (fT71) . Rearranging terms, we obtain 

5\M k ,U k ) _ a 2 <y/J ^ 2 (M*M*) _ ^ 

Thus, 

\fck < \/c*, 

and since c k > by assumption, we obtain 

Ck < c , 

as desired. □ 
This result can be weakened slightly to obtain the following corollary. 

Corollary 3.1. Suppose that we observe a vector y = x+n where x G M.* andn = (rii,n 2 , . . . , rij) 
is a random vector such that \\rij\\ < e, for j = 1,2, . . . , J and that the rij are i.i.d. with 
E[\\n j \\} = a<5{M k Mk)/2.If 

sHMkMk) < Sg^j , (18) 

anJ we classify according to ((9]), ([75]) anJ rfTol) /io/J vwY/? f/ze same constants as in Theorem 

m 



Proof. We can rewrite (1181) as 

5 (M fc ,Afc) < — j— . 

After rearranging terms, this reduces to 

Applying (fTT|) from Theorem |3~TI we obtain 

< 

which allows us to apply Theorem [3^21 to prove the desired result. □ 
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Corollary 13.11 shows that we can expect joint classification to outperform the k-th individ- 
ual classifier whenever the squared separation distance for the A;-th component manifolds is not 
too much larger than the average squared separation distance among the remaining component 
manifolds. Thus, we can expect that the joint classifier is outperforming most of the individual 
classifiers, but it is still possible that some of the individual classifiers are doing better. Of course, 
if one were able to know in advance which classifiers were best, then one would only use data 
from the best sensors. We expect that a more typical situation is when the separation distances are 
(approximately) equal across all sensors, in which case the condition in (TT8T ) is true for all of the 
component manifolds. 

3.2 Manifold learning 

In contrast to the classification scenario described above, where we knew the manifold structure a 
priori, we now consider manifold learning algorithms that attempt to learn the manifold structure 
by constructing a (possibly nonlinear) embedding of a given point cloud into a subset of M L , where 
L < N. Typically, L is set to K, the intrinsic manifold dimension. Several such algorithms have 
been proposed, each giving rise to a nonlinear map with its own special properties and advantages 
(e.g. Isomap [14], Locally Linear Embedding (LLE) [15], Hessian Eigenmaps [16], etc.) Such 
algorithms provide a powerful framework for navigation, visualization and interpolation of high- 
dimensional data. For instance, manifold learning can be employed in the inference of articulation 
parameters (eg., 3-D pose) of points sampled from an image appearance manifold. 

In particular, the Isomap algorithm deserves special mention. It assumes that the point cloud 
consists of samples from a data manifold that is (at least approximately) isometric to a convex 
subset of Euclidean space. In this case, there exists an isometric mapping / from a parameter 
space C M. K to the manifold Ai such that the geodesic distance between every pair of data 
points is equal to the £ 2 distance between their corresponding pre-images in 0. In essence, Isomap 
attempts to discover the coordinate structure of that K -dimensional space. 

Isomap works in three stages: 

• We construct a graph G that contains one vertex for each input data point; an edge connects 
two vertices if the Euclidean distance between the corresponding data points is below a 
specified threshold. 

• We weight each edge in the graph G by computing the Euclidean distance between the 
corresponding data points. We then estimate the geodesic distance between each pair of 
vertices as the length of the shortest path between the corresponding vertices in the graph G. 

• We embed the points in R A using multidimensional scaling (MDS), which attempts to embed 
the points so that their Euclidean distance approximates the geodesic distances estimated in 
the previous step. 

A crucial component of the MDS algorithm is a suitable linear transformation of the matrix of 
squared geodesic distances D; the mnk-K approximation of this new matrix yields the best pos- 
sible A'-dimensional coordinate structure of the input sample points in a mean-squared sense. 
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Further results on the performance of Isomap in terms of geometric properties of the underlying 
manifold can be found in [17]. 

We examine the performance of manifold learning using Isomap with samples of the joint 
manifold, as compared to learning any of the component manifolds. We first assume that we are 
given noiseless samples from the J isometric component manifolds Aii, M.2-, • • • , M. j- In order 
to judge the quality of the embedding learned by the Isomap algorithm, we will observe that for 
any pair of points p, q on a manifold At, we have that 

P< \ , < 1 (19) 
d M {p,q) 

for some p G [0, 1] that will depend on p, q. Isomap will perform well if the largest value of p that 
satisfies (fT9l) for any pair of samples that are connected by an edge in the graph G is close to 1. 
Using this result, we can compare the performance of manifold learning using Isomap on samples 
from the joint manifold At* to using Isomap on samples from a particular component manifold 
M k . 

Theorem 3.3. Let Ai* be a joint manifold from. J isometric component manifolds. Let p = 
(jPi,P2, ■ ■ ■ iPj) and q = (qi, (?2> • • • > Qj) denote a pair of samples of A4* and suppose that we 
are given a graph G that contains one vertex for each sample. For each k = 1, . . . , J, define pj as 
the largest value such that 

Pi < } Pj ~ qj \ < 1 (20) 
for all pairs of points connected by an edge in G. Then we have that 



< JlEZiL < 1. (21 ) 
J d M *(p,q) 



Proof. By Proposition |23] 



j 

\\p-i\\ 2 = Ibi ~rf> 

and from Theorem 12. II we have that 

d 2 M *(p,q) = Jd 2 Ml (p u qi). 
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Thus, 

^ „ 112 



lb-g|| 2 Ej=ilbi-?jl 



d 2 M *ip,q) Jd 2 Mi {p u qi) 



1 J 



bi-?ill 2 



> 



1 J 



which establishes the lower bound in (l2"TT) . The upper bound is trivial since we always have that 

d M *(p,q) > \\p -q\\. □ 

From Theorem I3.3I we see that, in many cases, the joint manifold estimates of the geodesic 
distances will be more accurate than the estimates obtained using one of the component manifolds. 
For instance, if for particular component manifold Mk we observe that 



Pk < 



j 



then we know that the joint manifold leads to better estimates. Essentially, we can expect that the 
joint manifold will lead to estimates that are better than the average case across the component 
manifolds. 

We now consider the case where we have a sufficiently dense sampling of the manifolds so that 
the pj are very close to one, and examine the case where we are obtaining noisy samples. We will 
assume that the noise affecting the data samples is i.i.d., and demonstrate that any distance calcu- 
lation performed on A4* serves as a better estimator of the pairwise (and consequently, geodesic) 
distances between two points labeled by p and q than that performed on any component manifold 
between their corresponding points pj and qj . 

Theorem 3.4. Let A4* be a joint manifold from J isometric component manifolds. Let p = 
(pi,P2, ■ ■ ■ , Pj) and q = (q%, qi, ■ ■ ■ , qj) be samples of M* and assume that \pj — qj \\ = dfor all j. 
Assume that we acquire noisy observations s = p + n and r = q + n', where n = (m, n-i, ■ ■ ■ , nj) 
and n! = (n^, n' 2 , . . . , n'f) are independent noise vectors with the same variance and norm bound 

E[||rij|| 2 ] = a 2 and \\nj\\ 2 < e, j = 1, . . . , J. 



Then, 



I 112 

\s — r 1 



P [ 1-6 < T — " " - < 1 + 6 ) > 1 - 2c- 
\\p — q\f + 2Ja z 



where c=exp ^25 2 (^g) 2 )- 
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Proof. We write the distance between the noisy samples as 



\s — r 



3=1 



?il| 2 + 2 (Pj - ty> n j ~ n 'j) + IK' 



This can be rewritten as 



r \\ 2 ~ 



J 

E 

3=1 



{2(Pj -?i,% + 11% 



' 1 1 2 



}■ 



We obtain the following statistics for the term inside the sum: 



E[( Pj - qj , nj - n'j} + Wnj-n'j 



J Il2l 



1 1 2 I 



= 2a 2 , 

< 2<Ve + e. 



ho- 



using Hoeffding's inequality, we obtain 

j 

^2{ 2 (Pj-Qi> n 3-rij) + 11% 

3=1 

This result is rewritten to obtain 

P(\\\s-r\\ 2 - \\p- q\\ 2 - 2Ja 2 
P(\\\s-r\\ 2 - \\p-q\\ 2 -2Ja 2 
Simplifying, we get 



! } - 2 Ja' 



> J 2 \ 



< 2e (2dv^+e) 2 



> JX) 



< 2e ( 2d v 7 + e ) 2 , 



> i _ 2e tw 



P I 



A 



< 



s — r 



d 2 + 2a 2 - ||p-g|| 2 + 2Jcr 2 



< 1 + 



A 



d 2 + 2a 2 



> 1 _ 2e IzdTfTo 7 



Replace 5 



d i+ 2(7 2 to obtain the result. 



(22) 



(23) 



□ 



We observe that the estimate of the true distance suffers from a constant small bias; this can 
be handled using a simple debiasing step@ Theorem 13.41 indicates that the probability of large 
deviations in the estimated distance decreases exponentially in the number of component manifolds 
J; thus the "denoising" effect in joint manifold learning is manifested even in the case where only 
a small number of component manifolds are present. 

As an example, we consider three different manifolds formed by images of an ellipse with 
major axis a and minor axis b translating in a 2-D plane; an example point is shown in Figure [3j 
The eccentricity of the ellipse directly affects the condition number 1/r of the image articulation 
manifold; in fact, it can be shown that articulation manifolds formed by more eccentric ellipses 
exhibit higher values for the condition number. Consequently, we expect that it is "harder" to learn 
such manifolds. 



2 Manifold learning algorithms such as Isomap deal with biased estimates of distances by "centering" the matrix of 
squared distances, i.e., removing the mean of each row/column from every element. 



19 



(i) (a, b) = (7, 7) (ii) (a, b) = (7, 6) (iii) (a, b) = (7, 5) 

Figure 3: Three articulation manifolds embedded in M 4096 sharing a common 2-D parameter space 6. 



Figure @] shows that this is indeed the case. We add a small amount of white gaussian noise 
to each image and apply the Isomap algorithm [14] to both the individual datasets as well as the 
concatenated dataset. We observe that the 2-D embedding is poorly learnt in each of the individual 
manifolds, but improves visibly when the data ensemble is modeled using a joint manifold. 

4 Joint manifolds for efficient dimensionality reduction 

We have shown that joint manifold models for data ensembles can significantly improve the perfor- 
mance on a variety of signal processing tasks, where performance is quantified using metrics like 
probability of error for detection and accuracy for parameter estimation and manifold learning. In 
particular, we have observed that performance tends to improve exponentially fast as we increase 
the number of component manifolds J. However, we have ignored that when J and the ambient 
dimension of the manifolds N become large, the dimensionality of the joint manifold — JN — 
may be so large that it becomes impossible to perform any meaningful computations. Fortunately, 
we can transform the data into a more amenable form via the method of random projections: it has 
been shown that the essential structure of a A'-dimensional manifold with condition number 1/r 
residing in M. N is approximately preserved under an orthogonal projection into a random subspace 
of dimension 0(K\og(N/r)) <C N [18]. This result can be leveraged to enable efficient design of 
inference applications, such as classification using multiscale navigation [19], intrinsic dimension 
estimation, and manifold learning [20]. 

We can apply this result individually for each sensor acquiring manifold-modeled data. Sup- 
pose iV-dimensional data from J component manifolds is available. If N is large, then the above 
result would suggest that we project each manifold into a lower-dimensional subspace. By collect- 
ing this data at a central location, we would obtain J vectors, each of dimension 0(K log N), so 
that we would have 0(JK log N) total measurements. This approach, however, essentially ignores 
the joint manifold structure present in the data. If we instead view the data as arising from a K- 
dimensional joint manifold residing in M. JN with bounded condition number as given by Theorem 
|2.2[ we can then project the joint data into a subspace which is only logarithmic in J as well as the 
largest condition number among the components, and still approximately preserve the manifold 
structure. This is formalized in the following theorem. 
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(iii) Joint manifold 

Figure 4: Results of Isomap applied to the translating ellipse image data sets. 

Theorem 4.1. Let Ai* be a compact, smooth, Riemannian joint manifold in a J N -dimensional 
space with condition number 1/r*. Let $ denote an orthogonal linear mapping from Ai* into 
a random M -dimensional subspace of M, JN . Let M > 0(K log(JA r /r*)/e 2 ). Then, with high 
probability, the geodesic and Euclidean distances between any pair of points on Ai* are preserved 
up to distortion e under the linear transformation $. 

Thus, we obtain a faithful approximation of our manifold-modeled data that is only 0{K log JN) 
dimensional. This represents a significant improvement over performing separate dimensionality 
reduction on each component manifold. 

Importantly, the linear nature of the random projection step can be utilized to perform dimen- 
sionality reduction in a distributed manner, which is particularly useful in applications when data 
transmission is expensive. As an example, consider a network of J sensors observing an event that 
is governed by a K-dimensional parameter. Each sensor records a signal Xj E M. N , 1 < j < J; 
the concatenation of the signals ] lies on a ^-dimensional joint manifold 

Ai* C M J7V . Since the required random projections are linear, we can take local random projec- 
tions of the observed signals at each sensor, and still calculate the global measurements of Ai* 
in a distributed fashion. Let each sensor obtain its measurements jjj = &jXj, with the matrices 
$j e R MxN , 1 < j < J. Then, by defining the M x JN matrix $=[$].... $j], our global 
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projections y* = can be obtained by 



y* = 

= 4 • 

= [$i $2 ••• 
= $1X1 + $ 2 X 2 + 

Thus, the final measurement vector can be obtained by simply adding independent random pro- 
jections of the signals acquired by the individual sensors. This method enables a novel scheme 
for compressive, multi-modal data fusion; in addition, the number of random projections required 
by this scheme is only logarithmic in the number of sensors J. Thus, the joint manifold frame- 
work naturally lends itself to a network- scalable data aggregation technique for communication- 
constrained applications. 

5 Discussion 

Joint manifolds naturally capture the structure present in a variety of signal ensembles that arise 
from multiple observations of a single event controlled by a small set of global parameters. We 
have examined the properties of joint manifolds that are relevant to real- world applications, and 
provided some basic examples that illustrate how they improve performance and help reduce com- 
plexity. 

We have also introduced a simple framework for dimensionality reduction for joint manifolds 
that employs independent random projections from each signal, which are then added together 
to obtain an accurate low-dimensional representation of the data ensemble. This distributed di- 
mensionality reduction technique resembles the acquisition framework proposed in compressive 
sensing (CS) [21,22]; in fact, prototypes of inexpensive sensing hardware [23,24] that can directly 
acquire random projections of the sensed signals have already been built. Our fusion scheme can 
be directly applied to the data acquired by such sensors. Joint manifold fusion via random pro- 
jections, like CS, is universal in the sense that the measurement process is not dependent on the 
specific structure of the manifold. Thus, our sensing techniques need not be replaced for these 
extensions; only our underlying models (hypotheses) are updated. 

The richness of manifold models allows for the joint manifold approach to be successfully ap- 
plied in a larger class of problems than principal component analysis and other linear model-based 
signal processing techniques. In fact, joint manifolds can be immediately applied in signal pro- 
cessing tasks where manifold models are common, such as detection, classification, and parameter 
estimation. When these tasks are performed in a sensor network or array, and random projections 
of the captured signals can be obtained, joint manifold techniques provide improved performance 
by leveraging the information from all sensors simultaneously. 
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